Search CORE

809 research outputs found

Decoding billions of integers per second through vectorization

Author: Aksyonoff A
Büttcher S
Jones DM
Witten IH
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see http://boytsov.info/datasets/clueweb09gap

arXiv.org e-Print Archive

R-libre

Crossref

Fast Hands-free Writing by Gaze Direction

Author: CE Shannon
David J. C. MacKay
David J. Ward
IH Witten
JG Cleary
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/08/2002
Field of study

We describe a method for text entry based on inverse arithmetic coding that relies on gaze direction and which is faster and more accurate than using an on-screen keyboard. These benefits are derived from two innovations: the writing task is matched to the capabilities of the eye, and a language model is used to make predictable words and phrases easier to write.Comment: 3 pages. Final versio

arXiv.org e-Print Archive

Crossref

Investigating five key predictive text entry with combined distance and keystroke modelling

Author: CL James
IH Witten
Mark D. Dunlop
MD Dunlop
Michelle Montgomery Masters
PM Fitts
SA Brewster
SK Card
SM Katz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2008
Field of study

This paper investigates text entry on mobile devices using only five-keys. Primarily to support text entry on smaller devices than mobile phones, this method can also be used to maximise screen space on mobile phones. Reported combined Fitt's law and keystroke modelling predicts similar performance with bigram prediction using a five-key keypad as is currently achieved on standard mobile phones using unigram prediction. User studies reported here show similar user performance on five-key pads as found elsewhere for novice nine-key pad users

Crossref

University of Strathclyde Institutional Repository

Towards an automated classification of spreadsheets

Author: D Jannach
IH Witten
M Hall
R Kohavi
R Quinlan
Y Yusof
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work

Universidade do Minho: RepositoriUM

Crossref

Learning Aligned-Spatial Graph Convolutional Networks for Graph Classification

Author: A Krizhevsky
D Zambon
IH Witten
L Bai
L Bai
N Shervashidze
N Shervashidze
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/04/2020
Field of study

In this paper, we develop a novel Aligned-Spatial Graph Convolutional Network (ASGCN) model to learn effective features for graph classification. Our idea is to transform arbitrary-sized graphs into fixed-sized aligned grid structures, and define a new spatial graph convolution operation associated with the grid structures. We show that the proposed ASGCN model not only reduces the problems of information loss and imprecise information representation arising in existing spatially-based Graph Convolutional Network (GCN) models, but also bridges the theoretical gap between traditional Convolutional Neural Network (CNN) models and spatially-based GCN models. Moreover, the proposed ASGCN model can adaptively discriminate the importance between specified vertices during the process of spatial graph convolution, explaining the effectiveness of the proposed model. Experiments on standard graph datasets demonstrate the effectiveness of the proposed model

Crossref

White Rose Research Online

Identifying Critical States by the Action-Based Variance of Expected Return

Author: CJ Watkins
G Liu
IH Witten
M Stolle
MG Bellemare
SJ Kazemitabar
V Mnih
Y Kuniyoshi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/11/2020
Field of study

The balance of exploration and exploitation plays a crucial role in accelerating reinforcement learning (RL). To deploy an RL agent in human society, its explainability is also essential. However, basic RL approaches have difficulties in deciding when to choose exploitation as well as in extracting useful points for a brief explanation of its operation. One reason for the difficulties is that these approaches treat all states the same way. Here, we show that identifying critical states and treating them specially is commonly beneficial to both problems. These critical states are the states at which the action selection changes the potential of success and failure substantially. We propose to identify the critical states using the variance in the Q-function for the actions and to perform exploitation with high probability on the identified states. These simple methods accelerate RL in a grid world with cliffs and two baseline tasks of deep RL. Our results also demonstrate that the identified critical states are intuitively interpretable regarding the crucial nature of the action selection. Furthermore, our analysis of the relationship between the timing of the identification of especially critical states and the rapid progress of learning suggests there are a few especially critical states that have important information for accelerating RL rapidly.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Performing Feature Selection with ACO

Author: A Chouchoulas
B Raman
B Sch ölkopf
E Bonabeau
G Salton
I D üntsch
IH Witten
IH Witten
JR Quinlan
LA Zadeh
M Dash
M Wygralak
Q Shen
R Jensen
R Jensen
R Jensen
Z Pawlak
Z Pawlak
Publication venue: Springer Nature
Publication date: 01/01/2006
Field of study

Summary. The main aim of feature selection is to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. In real world problems FS is a must due to the abundance of noisy, irrelevant or misleading features. However, current methods are inadequate at finding optimal reductions. This chapter presents a feature selection mechanism based on Ant Colony Optimization in an attempt to combat this. The method is then applied to the problem of finding optimal feature subsets in the fuzzy-rough data reduction process. The present work is applied to two very different challenging tasks, namely web classification and complex systems monitoring.

CiteSeerX

Crossref

Aberystwyth Research Portal

Bekenstein entropy bound for weakly-coupled field theories on a 3-sphere

Author: D Klemm
D Kutasov
E Elizalde
E Elizalde
E Witten
G Gibbons
IH Brevik
J Dowker
JD Bekenstein
JL Cardy
Joyce C. Myers
M Bordag
O Aharony
S Blau
S Hawking
S Lim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We calculate the high temperature partition functions for SU(Nc) or U(Nc) gauge theories in the deconfined phase on S^1 x S^3, with scalars, vectors, and/or fermions in an arbitrary representation, at zero 't Hooft coupling and large Nc, using analytical methods. We compare these with numerical results which are also valid in the low temperature limit and show that the Bekenstein entropy bound resulting from the partition functions for theories with any amount of massless scalar, fermionic, and/or vector matter is always satisfied when the zero-point contribution is included, while the theory is sufficiently far from a phase transition. We further consider the effect of adding massive scalar or fermionic matter and show that the Bekenstein bound is satisfied when the Casimir energy is regularized under the constraint that it vanishes in the large mass limit. These calculations can be generalized straightforwardly for the case of a different number of spatial dimensions.Comment: 32 pages, 12 figures. v2: Clarifications added. JHEP versio

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Investigating the effectiveness of client-side search/browse without a network connection

Author: G Pyrounakis
H Suleman
IH Witten
L Phiri
M Hart
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2019
Field of study

Search and browse, incorporating elements of information retrieval and database operations, are core services in most digital repository toolkits. These are often implemented using a server-side index, such as that produced by Apache SOLR. However, sometimes a small collection needs to be static and portable, or stored client-side. It is proposed that, in these instances, browser-based search and browse is possible, using standard facilities within the browser. This was implemented and evaluated for varying behaviours and collection sizes. The results show that it was possible to achieve fast performance for typical queries on small- to medium-sized collections

Crossref

UCT Computer Science Research Document Archive

Using deep learning for ordinal classification of mobile marketing user conversion

Author: E Frank
GE Batista
I Goodfellow
IH Witten
M Hollander
N Oliveira
P Cortez
RG Sousa
S Silva
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In this paper, we explore Deep Multilayer Perceptrons (MLP) to perform an ordinal classification of mobile marketing conversion rate (CVR), allowing to measure the value of product sales when an user clicks an ad. As a case study, we consider big data provided by a global mobile marketing company. Several experiments were held, considering a rolling window validation, different datasets, learning methods and performance measures. Overall, competitive results were achieved by an online deep learning model, which is capable of producing real-time predictions.This article is a result of the project NORTE-01-0247-FEDER-017497, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF). This work was also supported by Funda¸c˜ao para a Ciˆencia e Tecnologia (FCT) within the Project Scope: UID/CEC/00319/201

Universidade do Minho: RepositoriUM

Crossref